Skip to content

Conversation

@ZuseZ4
Copy link
Member

@ZuseZ4 ZuseZ4 commented Nov 21, 2025

Automates step 1 from the rustc-dev-guide offload section:
https://rustc-dev-guide.rust-lang.org/offload/usage.html#compile-instructions
"clang-offload-packager" "-o" "host.out" "--image=file=device.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"

Verified on an MI 250X

cc @jhuber6, @kevinsala, @jdoerfert, @Sa4dUs

r? oli-obk

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 21, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the automate-offload-packager branch from 2a3a561 to 80e8fce Compare November 21, 2025 09:49
@rustbot rustbot added the A-rustc-dev-guide Area: rustc-dev-guide label Nov 21, 2025
@rustbot
Copy link
Collaborator

rustbot commented Nov 21, 2025

oli-obk is not on the review rotation at the moment.
They may take a while to respond.

@ZuseZ4 ZuseZ4 marked this pull request as ready for review November 21, 2025 09:49
@rustbot
Copy link
Collaborator

rustbot commented Nov 21, 2025

The rustc-dev-guide subtree was changed. If this PR only touches the dev guide consider submitting a PR directly to rust-lang/rustc-dev-guide otherwise thank you for updating the dev guide with your changes.

cc @BoxyUwU, @jieyouxu, @Kobzol, @tshepang

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 21, 2025
@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 force-pushed the automate-offload-packager branch 3 times, most recently from 95a0037 to 519499d Compare November 21, 2025 10:16
@ZuseZ4 ZuseZ4 force-pushed the automate-offload-packager branch from 519499d to 88ca3bc Compare November 21, 2025 10:41
OS1.flush();
auto MB = llvm::MemoryBuffer::getMemBufferCopy(Storage, "module.bc");

SmallVector<char, 1024> BinaryData;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This buffer could be an argument provided by rustc and then rustc can do the file writing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, that the C++ version is resizable. If we provide a buffer from rust, it wouldn't be.
I asked and there is no reasonable default size, so we'd pass a (likely) too-small buffer in, set the needed length and return false, see that in rust, allocate a larger buffer with the requested size, call the method again, and hope that it now passes.
It's just 3 lines extra on the Rust side, and I don't expect it to become a compile-time bottleneck, since no one (famous last words) will compile >10k kernels, but it still feels ugly.

Copy link
Member Author

@ZuseZ4 ZuseZ4 Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked deeper into the next steps, and I think it's probably not worth cleaning up this write, since it will be fused with the next step, where we consume the in-memory host.out file.

I'll implement a save-temps equivalent later for debugging where we'll still write it out, but then we can handle all intermediate writes at once.
Similar to my offload frontend, which Marcello also just rewrote after we figured out, what we actually need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wip #149202

@oli-obk
Copy link
Contributor

oli-obk commented Nov 22, 2025

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Nov 22, 2025

📌 Commit 88ca3bc has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 22, 2025
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 22, 2025
…r=oli-obk

automate gpu offloading - part 1

Automates step 1 from the rustc-dev-guide offload section:
https://rustc-dev-guide.rust-lang.org/offload/usage.html#compile-instructions
`"clang-offload-packager" "-o" "host.out" "--image=file=device.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"`

Verified on an MI 250X

cc `@jhuber6,` `@kevinsala,` `@jdoerfert,` `@Sa4dUs`

r? oli-obk
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 22, 2025
…r=oli-obk

automate gpu offloading - part 1

Automates step 1 from the rustc-dev-guide offload section:
https://rustc-dev-guide.rust-lang.org/offload/usage.html#compile-instructions
`"clang-offload-packager" "-o" "host.out" "--image=file=device.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"`

Verified on an MI 250X

cc ``@jhuber6,`` ``@kevinsala,`` ``@jdoerfert,`` ``@Sa4dUs``

r? oli-obk
bors added a commit that referenced this pull request Nov 22, 2025
Rollup of 8 pull requests

Successful merges:

 - #147536 (Add `rust-mingw` component for `*-windows-gnullvm` hosts)
 - #148407 (Warn against calls which mutate an interior mutable `const`-item)
 - #149168 (Fix ICE when collecting opaques from trait method declarations)
 - #149170 (automate gpu offloading - part 1)
 - #149180 (Couple of refactors to SharedEmitter)
 - #149185 (Handle cycles when checking impl candidates for `doc(hidden)`)
 - #149194 (Move safe computation out of unsafe block)
 - #149204 (Fix typo in HashMap performance comment)

r? `@ghost`
`@rustbot` modify labels: rollup
@rust-log-analyzer

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Nov 23, 2025

💔 Test for 3267d11 failed: CI. Failed jobs:

@ZuseZ4 ZuseZ4 force-pushed the automate-offload-packager branch from 4855d36 to 44ad4a0 Compare November 23, 2025 08:12
@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Nov 23, 2025

@bors2 try jobs=dist-ohos-aarch64

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Nov 23, 2025
automate gpu offloading - part 1

try-job: dist-ohos-aarch64
@ZuseZ4 ZuseZ4 force-pushed the automate-offload-packager branch from 44ad4a0 to b31005e Compare November 23, 2025 08:20
@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Nov 23, 2025

@bors2 try jobs=dist-ohos-aarch64

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Nov 23, 2025
automate gpu offloading - part 1

try-job: dist-ohos-aarch64
@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Nov 23, 2025

Ok, now it works. I've copied the autodiff logic in a new commit, so that if llvm.offload=false (the default) we build and provide placeholder offload functions, which are unimplemented!(). There's a small chance I copied slightly more logic from autodiff than necessary, but Marcelo just finished his offload intrinsic pr, and in his next offload macro pr he'll need the missing pieces anyway.

I've verified that both llvm.offload enabled and disabled build on my server.

@ZuseZ4 ZuseZ4 requested a review from oli-obk November 23, 2025 09:30
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 23, 2025
@rust-bors
Copy link

rust-bors bot commented Nov 23, 2025

☀️ Try build successful (CI)
Build commit: 804a131 (804a13143bb52272e470ecbdeb4112ce7078986f, parent: e0e204f3e97ad5f79524b9c259dc38df606ed82c)

@oli-obk
Copy link
Contributor

oli-obk commented Nov 23, 2025

@bors r+

@bors
Copy link
Collaborator

bors commented Nov 23, 2025

📌 Commit b31005e has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 23, 2025
@bors
Copy link
Collaborator

bors commented Nov 23, 2025

⌛ Testing commit b31005e with merge 23f7081...

@bors
Copy link
Collaborator

bors commented Nov 23, 2025

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 23f7081 to main...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 23, 2025
@bors bors merged commit 23f7081 into rust-lang:main Nov 23, 2025
13 checks passed
@rustbot rustbot added this to the 1.93.0 milestone Nov 23, 2025
@github-actions
Copy link
Contributor

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing c268b39 (parent) -> 23f7081 (this PR)

Test differences

Show 4 test diffs

4 doctest diffs were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 23f708107b459ed551a860ef0bf8b61bc80b48b4 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. x86_64-gnu-llvm-20: 2900.0s -> 2213.9s (-23.7%)
  2. aarch64-apple: 9128.3s -> 6985.4s (-23.5%)
  3. x86_64-gnu-aux: 7548.8s -> 5958.4s (-21.1%)
  4. x86_64-gnu-distcheck: 8870.4s -> 7031.6s (-20.7%)
  5. x86_64-rust-for-linux: 3080.9s -> 2593.9s (-15.8%)
  6. pr-check-1: 1958.9s -> 1674.9s (-14.5%)
  7. i686-gnu-1: 8381.6s -> 7260.5s (-13.4%)
  8. x86_64-gnu-stable: 7320.2s -> 6378.6s (-12.9%)
  9. x86_64-gnu-tools: 3712.1s -> 3237.8s (-12.8%)
  10. aarch64-gnu-llvm-20-2: 2516.2s -> 2212.9s (-12.1%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (23f7081): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 4.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.1% [4.1%, 4.1%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 4.1% [4.1%, 4.1%] 1

Cycles

Results (primary 2.3%, secondary -3.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.3% [2.3%, 2.3%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.0% [-3.0%, -3.0%] 1
All ❌✅ (primary) 2.3% [2.3%, 2.3%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 469.133s -> 469.836s (0.15%)
Artifact size: 386.20 MiB -> 386.24 MiB (0.01%)

@ZuseZ4 ZuseZ4 mentioned this pull request Nov 24, 2025
@ZuseZ4 ZuseZ4 deleted the automate-offload-packager branch November 24, 2025 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-rustc-dev-guide Area: rustc-dev-guide merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants